103 research outputs found

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.

    Get PDF
    "Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further

    Arena3D: visualization of biological networks in 3D

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complexity is a key problem when visualizing biological networks; as the number of entities increases, most graphical views become incomprehensible. Our goal is to enable many thousands of entities to be visualized meaningfully and with high performance.</p> <p>Results</p> <p>We present a new visualization tool, Arena3D, which introduces a new concept of staggered layers in 3D space. Related data – such as proteins, chemicals, or pathways – can be grouped onto separate layers and arranged via layout algorithms, such as Fruchterman-Reingold, distance geometry, and a novel hierarchical layout. Data on a layer can be clustered via k-means, affinity propagation, Markov clustering, neighbor joining, tree clustering, or UPGMA ('unweighted pair-group method with arithmetic mean'). A simple input format defines the name and URL for each node, and defines connections or similarity scores between pairs of nodes. The use of Arena3D is illustrated with datasets related to Huntington's disease.</p> <p>Conclusion</p> <p>Arena3D is a user friendly visualization tool that is able to visualize biological or any other network in 3D space. It is free for academic use and runs on any platform. It can be downloaded or lunched directly from <url>http://arena3d.org</url>. Java3D library and Java 1.5 need to be pre-installed for the software to run.</p

    A reference guide for tree analysis and visualization

    Get PDF
    The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis

    LAITOR - Literature Assistant for Identification of Terms co-Occurrences and Relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of interest that are specific of a determined research field. Therefore, the study of the current literature about a selected topic deposited in public databases, facilitates the generation of novel hypotheses associating a set of bioentities to a common context.</p> <p>Results</p> <p>We created a text mining system (LAITOR: <it><b>L</b>iterature <b>A</b>ssistant for <b>I</b>dentification of <b>T</b>erms co-<b>O</b>ccurrences and <b>R</b>elationships</it>) that analyses co-occurrences of bioentities, biointeractions, and other biological terms in MEDLINE abstracts. The method accounts for the position of the co-occurring terms within sentences or abstracts. The system detected abstracts mentioning protein-protein interactions in a standard test (BioCreative II IAS test data) with a precision of 0.82-0.89 and a recall of 0.48-0.70. We illustrate the application of LAITOR to the detection of plant response genes in a dataset of 1000 abstracts relevant to the topic.</p> <p>Conclusions</p> <p>Text mining tools combining the extraction of interacting bioentities and biological concepts with network displays can be helpful in developing reasonable hypotheses in different scientific backgrounds.</p

    ReLiance: a machine learning and literature-based prioritization of receptor—ligand pairings.

    Get PDF
    Motivation: The prediction of receptor—ligand pairings is an important area of research as intercellular communications are mediated by the successful interaction of these key proteins. As the exhaustive assaying of receptor—ligand pairs is impractical, a computational approach to predict pairings is necessary. We propose a workflow to carry out this interaction prediction task, using a text mining approach in conjunction with a state of the art prediction method, as well as a widely accessible and comprehensive dataset. Among several modern classifiers, random forests have been found to be the best at this prediction task. The training of this classifier was carried out using an experimentally validated dataset of Database of Ligand-Receptor Partners (DLRP) receptor—ligand pairs. New examples, co-cited with the training receptors and ligands, are then classified using the trained classifier. After applying our method, we find that we are able to successfully predict receptor—ligand pairs within the GPCR family with a balanced accuracy of 0.96. Upon further inspection, we find several supported interactions that were not present in the Database of Interacting Proteins (DIPdatabase). We have measured the balanced accuracy of our method resulting in high quality predictions stored in the available database ReLiance. Availability: http://homes.esat.kuleuven.be/?bioiuser/ReLianceDB/ index.php Contact: [email protected]; ernesto.iacucci@gmail. com Supplementary information: Supplementary data are available at Bioinformatics onlin

    Arena3D: visualizing time-driven phenotypic differences in biological systems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Elucidating the genotype-phenotype connection is one of the big challenges of modern molecular biology. To fully understand this connection, it is necessary to consider the underlying networks and the time factor. In this context of data deluge and heterogeneous information, visualization plays an essential role in interpreting complex and dynamic topologies. Thus, software that is able to bring the network, phenotypic and temporal information together is needed. Arena3D has been previously introduced as a tool that facilitates link discovery between processes. It uses a layered display to separate different levels of information while emphasizing the connections between them. We present novel developments of the tool for the visualization and analysis of dynamic genotype-phenotype landscapes.</p> <p>Results</p> <p>Version 2.0 introduces novel features that allow handling time course data in a phenotypic context. Gene expression levels or other measures can be loaded and visualized at different time points and phenotypic comparison is facilitated through clustering and correlation display or highlighting of impacting changes through time. Similarity scoring allows the identification of global patterns in dynamic heterogeneous data. In this paper we demonstrate the utility of the tool on two distinct biological problems of different scales. First, we analyze a medium scale dataset that looks at perturbation effects of the pluripotency regulator Nanog in murine embryonic stem cells. Dynamic cluster analysis suggests alternative indirect links between Nanog and other proteins in the core stem cell network. Moreover, recurrent correlations from the epigenetic to the translational level are identified. Second, we investigate a large scale dataset consisting of genome-wide knockdown screens for human genes essential in the mitotic process. Here, a potential new role for the gene <it>lsm14a </it>in cytokinesis is suggested. We also show how phenotypic patterning allows for extensive comparison and identification of high impact knockdown targets.</p> <p>Conclusions</p> <p>We present a new visualization approach for perturbation screens with multiple phenotypic outcomes. The novel functionality implemented in Arena3D enables effective understanding and comparison of temporal patterns within morphological layers, to help with the system-wide analysis of dynamic processes. Arena3D is available free of charge for academics as a downloadable standalone application from: <url>http://arena3d.org/</url>.</p
    corecore